logo

Steps data analysis

Introduction

Data set

The variables included in the data set are:

Field Description
AmountWeek How many cups of coffee do you typically consume weekly?
AmountOutMonth How frequently do you drink out-of-home per month on average?
MoneyCoffee How much money on average do you estimate you spend on coffee per month?
MoneyGroceries How much on average do you spend on general groceries per month?
Machine How do you brew your coffee at home?
Brand change How often do you switch between coffee brands?
Purchase location Where do you usually purchase your coffee?
Supermarket_Positive_Reasons When you purchase coffee from the supermarket what are your main reasons for doing so?
Supermarket_Negative_Reasons What would be reasons why you would not purchase coffee from the supermarket?
Criteria_Type_Coffee What are your main criteria’s or evaluation points for choosing the type of coffee?
KnowledgeCoffee How would you describe your knowledge level regarding coffee in general?
Purchase_Price I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Sustainability I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Sustainability I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Fairtrade I believe that the ____ is important to my decision on which coffee to purchase.
Purchase_Packaging I believe that the ____ is important to my decision on which coffee to purchase.
Frequency_Specialty How often do you drink specialty coffee?
Subscription_Likely How likely are you to have an online subscription for (specialty) coffee?
Subscription_Not_Likely What is the number one reasons why you would be hesitant?
App_Likely How likely are you to value and use an app for your online subscription?
Gender What is your gender?
AgeCategory What is your age category?
Occupation What is your occupational status?
Education What level of education have you completed?
Home How would you describe the place you currently live in?

Univariate descriptions - Categorical variables

Age category

Age Category Absolute Relative
< 18 2 0.85%
18-25 72 30.64%
25-45 101 42.98%
45-60 49 20.85%
> 60 11 4.68%

Home

Home Absolute Relative
Rural (Town) 24 10.21%
Suburbs 18 7.66%
Urban (City) 193 82.13%

Gender

Gender Absolute Relative
Female 153 65.11%
Male 80 34.04%
Other 2 0.85%

Education

Education Absolute Relative
Elementary school 3 1.28%
High school 22 9.36%
Associate degree 19 8.09%
Bachelor’s degree 128 54.47%
Master 59 25.11%
Phd 4 1.70%

Machine

Machine Absolute Relative
Aeropress 1 0.43%
CupMachine 74 31.49%
Espresso machine 75 31.91%
Filter machine 48 20.43%
French press 9 3.83%
Instant coffee 5 2.13%
Moka pot 18 7.66%
Percolator 1 0.43%
V60 4 1.70%

Brand choose

Brand choice Absolute Relative
Never 77 32.77%
Sometimes 132 56.17%
Very often 23 9.79%
Every time 3 1.28%

Purchase Method

Purchase Method Absolute Relative
E-commerce 40 17.02%
Online subscription 14 5.96%
Specialty stores or cafés 29 12.34%
The supermarket 152 64.68%

Multiple option answers:

Reasons buying from the supermarket

Reasons Frequency
I am satisfied with the product 90
Price 71
Time-saving 56
Convenience 53
I do not purchase coffee from the supermarket 45
I do not have specialty stores near where I live 16
Other 3

Reasons for not buying from the supermarket

Reasons Frequency
No reason 102
Better quality elsewhere 96
Not enough variety 28
Not wanting to support big cooperations 22
It is not fresh 17
Lack of sustainable options 8
I don’t buy from supermarkets 7
Price 2

Criteria for choosing the type of coffee

Reasons Frequency
Flavour profile 149
Price 89
Roast level 64
Origin 38
Arabica or Robusta 18
Sustainability & Fair Trade 16

Purchase decisions 1-5

Price

Purchase decision - price Absolute Relative
1 25 10.64%
2 55 23.40%
3 58 24.68%
4 52 22.13%
5 45 19.15%

Sustainability

Purchase decision - sustainability Absolute Relative
1 18 7.66%
2 38 16.17%
3 84 35.74%
4 60 25.53%
5 35 14.89%

Certificates

Purchase decision - certificate Absolute Relative
1 44 18.72%
2 63 26.81%
3 80 34.04%
4 35 14.89%
5 13 5.53%

Fairtrade

Purchase decision - fairtrade Absolute Relative
1 22 9.36%
2 37 15.74%
3 77 32.77%
4 63 26.81%
5 36 15.32%

Packaging

Purchase decision - packaging Absolute Relative
1 70 29.79%
2 64 27.23%
3 49 20.85%
4 37 15.74%
5 15 6.38%

Combined data

Importance Price Sustainability Certificates Fairtrade Packaging
1 44 25 18 44 22
2 63 55 38 63 37
3 80 58 84 80 77
4 35 52 60 35 63
5 13 45 35 13 36

Frequency specialty coffee consumption

Frequency coffee consumption Absolute Relative
I do (did) not know what this is 55 23.40%
Never 41 17.45%
Only in cafes 47 20.00%
Sometimes 63 26.81%
Always 29 12.34%

Reasons for not being likely to set up a subscription

Reasons Frequency
I do not like being stuck with subscriptions 111
I am happy with my coffee now 109
The price 56
I do not consume enough coffee at home 22
The packaging that is required for delivery 15
No reason 15
I already have a subscription 10
Other 4


Univariate descriptions - Numerical variables

Amount coffe consumed weekly

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   10.00   15.00   18.48   25.00   70.00 

Amount per month out of house

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    2.00    5.00    8.03   10.00   40.00 

Money coffee

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   10.00   20.00   25.38   35.00  120.00 

Money groceries

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   160.0   200.0   247.8   300.0   900.0 

Subscription likely

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   3.000   3.877   6.000  10.000 

App likely

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   4.000   4.323   7.000  10.000 


Boxplots


Parametric testing

H_0 <- There is no association between the two variables.
H_a <- There is a association.

Age - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and AgeCategory
X-squared = 241.68, df = 136, p-value = 0.00000006432

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and AgeCategory
X-squared = 241.68, df = NA, p-value = 0.02595

Education - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Education
X-squared = 229.99, df = 170, p-value = 0.001491

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Education
X-squared = 229.99, df = NA, p-value = 0.06786

Gender - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Gender
X-squared = 69.019, df = 68, p-value = 0.4427

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Gender
X-squared = 69.019, df = NA, p-value = 0.3653

Home - Amount coffee drank


    Pearson's Chi-squared test

data:  AmountWeek and Home
X-squared = 66.506, df = 68, p-value = 0.5286

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  AmountWeek and Home
X-squared = 66.506, df = NA, p-value = 0.5469

App - Age


    Pearson's Chi-squared test

data:  App_Likely and AgeCategory
X-squared = 58.189, df = 36, p-value = 0.01103

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  App_Likely and AgeCategory
X-squared = 58.189, df = NA, p-value = 0.01597

Coffee knowledge - Age


    Pearson's Chi-squared test

data:  KnowledgeCoffee and AgeCategory
X-squared = 154.32, df = 36, p-value < 0.00000000000000022

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  KnowledgeCoffee and AgeCategory
X-squared = 154.32, df = NA, p-value = 0.001996

Coffee knowledge - Purchase location


    Pearson's Chi-squared test

data:  KnowledgeCoffee and PurchaseLocation
X-squared = 34.489, df = 27, p-value = 0.1523

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  KnowledgeCoffee and PurchaseLocation
X-squared = 34.489, df = NA, p-value = 0.1597

    Pearson's Chi-squared test

data:  Subscription_Likely and App_Likely
X-squared = 347.04, df = 81, p-value < 0.00000000000000022

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and App_Likely
X-squared = 347.04, df = NA, p-value = 0.001996

    Pearson's Chi-squared test

data:  Subscription_Likely and KnowledgeCoffee
X-squared = 109.94, df = 81, p-value = 0.01789

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and KnowledgeCoffee
X-squared = 109.94, df = NA, p-value = 0.01397

    Pearson's Chi-squared test

data:  Subscription_Likely and AmountWeek
X-squared = 311.13, df = 306, p-value = 0.4078

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and AmountWeek
X-squared = 311.13, df = NA, p-value = 0.4411

    Pearson's Chi-squared test

data:  Subscription_Likely and Frequency_Specialty
X-squared = 102.57, df = 36, p-value = 0.00000002601

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and Frequency_Specialty
X-squared = 102.57, df = NA, p-value = 0.001996

    Pearson's Chi-squared test

data:  Subscription_Likely and BrandChange
X-squared = 38.718, df = 27, p-value = 0.06719

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and BrandChange
X-squared = 38.718, df = NA, p-value = 0.0978

    Pearson's Chi-squared test

data:  Subscription_Likely and PurchaseLocation
X-squared = 61.31, df = 27, p-value = 0.0001772

    Pearson's Chi-squared test with simulated p-value (based on 500
    replicates)

data:  Subscription_Likely and PurchaseLocation
X-squared = 61.31, df = NA, p-value = 0.001996

Relationships


Regressions


==================================================================
                                 Dependent variable:              
                    ----------------------------------------------
                                 Subscription_Likely              
                              (1)                    (2)          
------------------------------------------------------------------
KnowledgeCoffee            0.324***                0.325***       
                            (0.087)                (0.088)        
                                                                  
Purchase_Fairtrade         0.384***                               
                            (0.147)                               
                                                                  
AmountWeek                                         -0.027*        
                                                   (0.015)        
                                                                  
MoneyCoffee                                         0.016*        
                                                   (0.009)        
                                                                  
Constant                     0.787                 2.108***       
                            (0.700)                (0.586)        
                                                                  
------------------------------------------------------------------
Observations                  235                    235          
R2                           0.084                  0.077         
Adjusted R2                  0.076                  0.065         
Residual Std. Error    2.633 (df = 232)        2.649 (df = 231)   
F Statistic         10.601*** (df = 2; 232) 6.406*** (df = 3; 231)
==================================================================
Note:                                  *p<0.1; **p<0.05; ***p<0.01
   KnowledgeCoffee Purchase_Fairtrade 
          1.000477           1.000477 
KnowledgeCoffee      AmountWeek     MoneyCoffee 
       1.016549        1.068581        1.072962